Physiological Measurement
○ IOP Publishing
Preprints posted in the last 30 days, ranked by how well they match Physiological Measurement's content profile, based on 12 papers previously published here. The average preprint has a 0.02% match score for this journal, so anything above that is already an above-average fit.
Chen, Y.; Ketheeswaranathan, V.; Fordington, S.; Baxter, L.; Stevens, F.; Zandvoort, C. S.; Gawthorpe, R.; Villarroel, M.; Berthouze, L.; Hartley, C.
Show abstract
Background: Apnoea of prematurity is common and may cause desaturation and/or bradycardia. There is marked variability in infants cardiorespiratory responses to apnoea, despite standardised clinical thresholds. Factors influencing apnoea-related cardiorespiratory instability and whether instability can be predicted warrant investigation. Methods: 181,511 apnoeas >5 seconds were identified from continuous physiological recordings from 146 preterm infants <37 weeks postmenstrual age. Cardiorespiratory instability was defined as bradycardia (>30% heart rate reduction) and/or oxygen desaturation (<85%). Mixed-effects models assessed clinical, demographic and dynamic modulators of the relationship between apnoea duration and cardiorespiratory instability. Machine learning (XGBoost) was used to train models to predict apnoea-related cardiorespiratory instability. Results: Longer duration apnoeas were associated with increased instability, although variability was substantial and 3.6% of apnoeas <10 seconds were associated with cardiorespiratory instability, while 61.2% of apnoeas [≥]20 seconds were not. Multiple clinical/demographic (postmenstrual and gestational age, sex, weight z-score, and ventilation mode) and dynamic (baseline heart rate, oxygen saturation, and recent apnoea clustering) factors were associated with increased instability risk. Apnoea-related cardiorespiratory instability could be predicted with a balanced test accuracy of 75.8% when incorporating all features, while a model using only clinical/demographic features achieved 66.0%. Conclusions: Multiple factors influence cardiorespiratory responses to apnoea. Predictive modelling may enable personalised apnoea definitions, improving individualised care.
Tuttle, M.; Maas, C. C. H. M.; An, J.; Wessler, B. S.; Harvey, W. F.; Selker, H. P.; van Klaveren, D.; Kent, D. M.
Show abstract
The Epic Sepsis Model version 2 (ESMv2) is a prediction model embedded into the electronic medical record used to warn clinicians which hospitalized patients are at risk for sepsis. We conducted a retrospective cohort study of 31,951 hospitalizations of 25,760 patients to compare analyses conducted at the commonly used patient-level (where a maximum prediction prior to the onset of sepsis is used to measure performance) vs novel prediction-level (where each prediction is used to measure performance). Sepsis, defined by the Sepsis 3 criteria occurred during 1,049 hospitalizations (3.3%). Patient-level analyses suggested excellent discrimination AUC 0.86; [IQR 0.85, 0.87], whereas prediction-level analyses demonstrated lower performance AUC 0.62; [IQR 0.57, 0.65]. Low estimates of the positive predictive value (14.5% at the patient level vs 4% at the prediction level) imply a high number of false alerts. Common evaluation approaches may overstate the performance of dynamic prediction models and mislead clinical decision-making.
Chuma, A. T.; Wang, C.; Voigt, J.-u.; Mekonnen, D.; Asmare, M. H.; Vanrumste, B.
Show abstract
Rheumatic heart disease (RHD) remains a major public health concern across low- and middle-income countries in the Global South. Early detection through community-based screening of asymptomatic individuals has been identified as a critical strategy for reducing the disease burden. Despite this, the absence of accessible, automated population screening tools continues to impede implementation at scale. This study investigates the screening potential of integrating electrocardiography (ECG) and phonocardiography (PCG) for the early detection of RHD in asymptomatic schoolchildren. The dataset was obtained as part of an ambulatory screening initiative conducted across multiple school sites in rural areas of Ethiopia. It comprised ECG and PCG recordings from 611 asymptomatic schoolchildren aged 10 to 20 years. A comprehensive set of time-frequency, visibility graph and non-linear features were extracted from both signal modalities. These features were subsequently evaluated using machine learning models to assess their utility in the automated screening of early RHD. The best model achieved an average 10-folds cross-validation scores on sensitivity, positive-predictive-value and F1-score of 59.6%, 63.6% and 60.8%, respectively for multimodal ECG and PCG signals. Whereas separate evaluation of ECG showed an F1-score of 61.1% and PCG achieved 23.5%. Key features included the T-wave, the area under the QRS complex, and entropy measures derived from beat visibility graphs in the ECG. In addition, visibility graph features from multi-band S1 and S2 heart sound segments, along with MFCC coefficients from the PCG, were also relevant. However, PCG alone performed poorly and did not show improved results over the ECG features. Although auscultation is key clinical diagnosis tool in symptomatic RHD, combined PCG with ECG features does not enhance asymptomatic RHD detection using the ECG modality alone.
Vanegas Mueller, E.; Joe-Oshodi, A.; Banerjee, A.; Villarroel, M.
Show abstract
Cardiovascular disease is the leading cause of death worldwide. Sudden cardiac death (SCD) accounts for roughly 50% of all cardiac deaths. The electrocardiogram (ECG) is widely used for early diagnosis of cardiac disease. However, the complexity of accurate interpretation limits the ECG's efficacy. Modern deep learning methods have been applied to assist clinicians in diagnosis. We applied Neural Architecture Search (NAS), an automated machine learning technique, to identify optimal deep learning architectures for classifying cardiac arrhythmias from ECGs. We applied the Differentiable Architecture Search strategy to an AutoFormer search space to identify optimal self-attention architectures for arrhythmia classification. We trained, validated, and tested the resulting model on the PhysioNet Challenge 2021 dataset (n = 88,253), comprising ECGs across three continents. We performed a hyperparameter optimisation on the NAS output, exploring input patch size, class weighting, and loss function. We evaluated performance using the PhysioNet Challenge metric and the area under the receiver operating characteristic curve (AUROC). The NAS converged towards minimal architectural configurations (embedding dimension: 384, depth: 4, self-attention heads: 4, MLP ratio: 1) with a validation challenge metric of 0.66 (PhysioNet Challenge 21 Winner: 0.63). The NAS-created network achieved an AUROC of 0.97 and a challenge metric of 0.71 during testing. Normal Sinus Rhythm and Sinus Tachycardia achieved AUROCs of 0.99. Low-QRS Voltage and T-wave abnormality were the worst-performing arrhythmias, with AUROCs of 0.89 and 0.90, respectively. We interpret that architectural simplicity drives performance in arrhythmia classification. Because SCD is unexpected, prevention strategies in free-living environments require lightweight computational resources suitable for wearable devices. Class imbalance fundamentally limits classification performance for rare arrhythmias such as Low-QRS Voltage and T-wave inversion, irrespective of hyperparameter choices. However, the self-attention mechanism can autonomously abstract clinical representations, simplifying clinical deployment by eliminating the need for an explicit feature-extraction pipeline.
Rao M, S.; Khezrimotlagh, D.
Show abstract
Non-invasive wrist pulse monitoring has been integrated into various medical systems for cardiovascular assessment. However, different definitions of pulse transit time are used in the literature, and their statistical behavior when measured locally at the wrist using pressure sensors has not been systematically examined. Wearable wristbands designed to measure pulse transit time (PTT) have emerged as valuable tools for evaluating cardiac activity. While several algorithms have been developed to predict blood pressure using PTT, it is well recognized that PTT and its inverse parameter, pulse wave velocity (PWV), exhibit temporal variability. In this study, PTT was explicitly measured at the wrist's radial artery to investigate its statistical variation and relationship with different arterial pressures. The experiment exhibits two distinct methodologies for PTT computation using onset-based and peak based measurements. Data were recorded across five cuff pressure levels at 20, 40, 60, 80, and 100 mmHg using the pulse pressure sensor (PPS). PTTonset time shows lower coefficient of variation as compared to PTTpeak time within the 100 mmHg pressure range. The weak correlation coefficient is recorded between PTT values. However, dynamic time warping (DTW) analysis revealed a notable similarity in the time series of PTTonset and PTTpeak, regardless of the applied pressure level. For the multi participant dataset, the mean DTW distances ranged from 0.029 to 0.046 across the tested cuff pressures, illustrating consistent similarity between PTTonset and PTTpeak over time. The objective of this study is to examine the statistical behavior, stability, and temporal similarity of the two commonly used PTT definitions when measured at the radial artery using pressure sensors. Statistical analysis shows consistent differences between the two PTT definitions across participants. PTTonset shows lower variation than PTTpeak. However, PTTpeak requires simpler computation and produces fewer detection errors, while PTTonset provides lower statistical variation.
Chen, P.-W.; Cielo, C.; Walsh, O.; Mcdonald, M.; Song, P. X.; Goldstein, C.; Moreno, J. P.; Jansen, E.; Mitchell, J. A.
Show abstract
Introduction: Actigraphy sleep-wake classification methods increasingly seek to leverage raw acceleration data and machine-learning-based classification, but performance evaluation in pediatrics is limited. We trained machine-learning models using pediatric data and compared their sleep-wake classification performance with existing algorithms for children. Methods: Sixty-five children (46% female, ages 5.3 to 17.7 years) completed in-lab overnight polysomnography and wore a GENEActiv device on their non-dominant wrist. The acceleration data were converted into 30-second epochs and aligned with physician-scored sleep-wake data from electroencephalography. Seven machine-learning models were trained using leave-one-subject-out cross-validation. Epoch-by-epoch analyses generated performance metrics (e.g., balanced accuracy [BA]) and discrepancy analyses provided overall sleep duration bias estimates. The combination of highest performance and least bias was used to rank using Euclidean distance scores - where a lower score represents closer to perfect performance and zero bias. For benchmarking, we included GGIR sleep scoring algorithms and an adult trained random forest classifier. Results: Overall, 560.1 hours of polysomnography and actigraphy data were collected (74.4% of epochs were scored as sleep). The pediatric-trained local-global long-short term memory (LSTM) classifier had the most optimal epoch-by-epoch performance (e.g., BA=0.85, sensitivity=0.88, specificity=0.83, ROC-AUC=0.95, and Cohen kappa=0.67). These metrics exceeded that of an adult-trained random forest classifier and GGIR-based algorithms. Discrepancy analyses revealed that overall sleep duration was underestimated by an average of 25 minutes using the LSTM classifier with no proportional bias. Conclusion: We trained seven pediatric sleep-wake classifiers that had strong ability to detect sleep and wake, with the LSTM classifier being most optimal.
Fabry, B.; Kuster, C.; Francis, R.
Show abstract
The endotracheal tube resistance dominates the total airway resistance in most intubated patients. Mucus deposition and biofilm formation can rapidly increase tube resistance and thereby contribute to serious ventilatory impairments, including dynamic hyperinflation, intrinsic PEEP build-up, added work of breathing, and patient-ventilator asynchrony. During controlled mechanical ventilation, an increased tube resistance can be inferred from the difference between peak and plateau pressure, but this approach fails during pressure-supported spontaneous breathing. Here, we present a method that estimates the linear and nonlinear components of tube resistance from naturally occurring airway pressure and flow fluctuations at the airway opening, without a tracheal pressure sensor and without applying mandatory forced oscillations. This is achieved by solving the equation of motion using band-pass filtered airway pressure and flow signals. Band-pass filtering isolates the relevant resistive and inertive pressure losses across the tube by removing slow contributions from muscle pressure and lung elastance as well as high-frequency noise. The method accurately recovers both linear and nonlinear tube resistance parameters with < 10% error and < 2% bias. Moreover, it enables real-time implementation of full Automatic Tube Compensation (ATC), even in the presence of severe tube obstructions. Continuous estimation of endotracheal tube resistance from naturally occurring airway pressure and flow fluctuations enables real-time detection of clinically relevant tube narrowing and may help improve patient safety, reduce patient-ventilator asynchrony, and facilitate weaning.
Chuma, A. T.; Wang, c.; Asmare, M. h.; Varon, C.; Voigt, J.-U.; Kassie, D. M.; Zuhlke, L.; Vanrumste, B.
Show abstract
Early detection of Rheumatic Heart Disease (RHD) is essential in reducing its associated mortality and late complications. In resource-limited settings, automated detection using low-cost electrocardiogram (ECG) sensors can enhance prevention efforts. However, its effectiveness as a potential RHD screening tool in at-risk populations remains unexplored. This study aimed to investigate the utility of machine learning for classifying RHD in a cohort screened for RHD using low-cost ECG devices. The ECGs were collected from 611 at-risk schoolchildren using KardiaMobile, where 47 were confirmed RHD and 564 were healthy. First, the ECG fiducial points were annotated using a publicly available prominence-based delineator. Then, temporal, frequency, wavelet, and visibility graph-based features were extracted from six-leads and fed to the XGBoost classifier. A 10-fold cross-validation was used at different prediction score thresholds to obtain target sensitivity (Se) for screening RHD. Single-lead evaluation on Lead-II showed an F1-score of 60.9%, a Se of 59.6% and a positive-predictive-value (PPV) of 62.2%. However, using multiple leads improved the results, with an F1-score of 62.8%, a Se of 59.6% and a PPV of 66.7%. The best model performance was achieved by adjusting the threshold to 0.6 with Se and PPV of 66% and 51%, respectively. Error analysis revealed that T-wave and STT changes, as well as non-rheumatic mitral valve cases were among the false positive cases. Machine learning can enhance early detection by leveraging relevant ECG features and adjustable target sensitivity based on screening priorities and resource capacity. Measurements can be obtained without chest contact, using only the fingers and knees, thereby enabling use by non-clinical staff. This approach provides a scalable and cost-effective solution for RHD screening in high-prevalence regions.
Nakano, T.; Saito, K.; Noda, K.; Asai, Y.; Kojima, A.; Uchida, H.; Ohira, Y.; Ito, H.; Kawada, J.-i.; Yoshikawa, T.
Show abstract
Kawasaki disease (KD) is a systemic vasculitis in young children, and early diagnosis remains challenging when clinical features are incomplete or overlap with those of other febrile illnesses. Because electrocardiography (ECG) is noninvasive and widely available, we investigated whether ECG-derived features could help distinguish complete KD from pediatric patients with fevers. We conducted a single-center retrospective study of hospitalized febrile children aged 1-8 years who underwent digital 12-lead ECG recording during the initial evaluation. Five amplitude features and six timing features extracted from the ECG were used to develop a logistic regression model to distinguish between complete KD and other febrile illnesses. The model discriminated between the KD and non-KD groups in the validation dataset. The prediction score was not significantly correlated with the age and body temperature. S-wave amplitude, the RR interval, and P-and Q-wave amplitudes were suggested to contribute to discrimination. These findings suggest that ECG-derived features may provide adjunctive information for distinguishing complete KD from other febrile illnesses. Author SummaryKawasaki disease is an inflammatory illness in young children that can lead to coronary artery complications if treatment is delayed. Early diagnosis is often difficult because its initial symptoms overlap with those of many common febrile illnesses. We investigated whether a routine 12-lead electrocardiogram (ECG), which is noninvasive, rapid, and widely available, contains information that can help distinguish complete Kawasaki disease from other febrile conditions. We retrospectively analyzed digital ECGs from hospitalized febrile children and extracted waveform amplitude and timing features. Using these features, we built a logistic regression model and evaluated it in a temporally separate validation cohort. The model distinguished patients with Kawasaki disease from patients with fever. P-, Q-, and S-wave amplitudes and the RR interval were repeatedly selected as important contributors, suggesting that both waveform morphology and heart-rate-related information may be relevant. These findings indicate that ECG-derived features may provide useful adjunctive information during the clinical assessment of complete Kawasaki disease.
Bakumenko, A.; Smith, D. H.; Hoelscher, J.
Show abstract
Earlier ICU mortality prediction is more clinically useful because it can identify high-risk patients while treatment decisions can still change. Yet most models are trained on data from a fixed time window, so it is unclear whether a model trained on the first 48 hours of ICU data remains reliable when used earlier in the ICU stay. We evaluated a multimodal ICU mortality model trained once at 48 hours and then applied unchanged at 6, 12, 24, and 48 hours on MIMIC-III. The model combines an LSTM for physiological time-series data, a finetuned ClinicalModernBERT model for clinical notes, and a logistic regression fusion layer. Performance remained strong at earlier time points, suggesting that useful mortality prediction is possible earlier in the ICU stay even without retraining. At 6 hours, the model achieved AUROC 0.777 and remained well-calibrated (ECE 0.038) without any recalibration, and it outperformed both single-modality models at every horizon. The multimodal benefit was most evident at earlier horizons, when physiological data were sparse: agreement between the two specialists dropped by more than half from 48 to 6 hours, while the median contribution from clinical notes increased from 37% to 49%. A Bayesian version of the fusion layer showed that uncertainty decreased for survivors as more data accumulated but remained high for non-survivors; the most uncertain cases were up to 4.9 times more likely to be non-surviving patients. Continuous hourly analyses further showed that clinical notes provide stable context between documentation events. Simply carrying forward the most recent note matched or outperformed note-decay and documentation-gap alternatives. These results suggest that a multimodal ICU mortality model trained on 48 hours of data can provide trustworthy earlier predictions without retraining, while also identifying the cases that remain hardest to interpret.
Arshad, A.; Carey, K. A.; Daniels, L. A.; Jani, P.; Gilbert, E.; Sanchez-Pinto, L. N.; Mayampurath, A.
Show abstract
Objective: Readmissions to the PICU are associated with increased morbidity and mortality. A prediction model that can identify children at risk of readmission at the time of transfer can allow providers to intervene and potentially improve patient outcomes. The objective of this study was to derive and validate machine learning models to predict PICU readmission at the time of transfer. Design: Retrospective observational cohort study Setting: Three quaternary care PICUs in the city of Chicago Patients: All children admitted to the PICU between 2012 and 2019. Measurements: The primary outcome was unplanned readmission to the PICU within 48 hours of transfer to the inpatient ward. Predictor variables included vital signs, patient characteristics, and laboratory results. We developed and externally validated four models to predict PICU readmission: logistic regression, elastic net, random forest, and XGBoost. Main Results: This study included 35,601 patients, with readmission rates ranging from 2.2-3.7% by site. The performance of models during internal validation was consistent at the three sites, with the area under the receiver operating characteristic (AUC) values between 0.70 and 0.73 and no difference across the four models. Model performance decreased significantly during external validation (AUCs of 0.60-0.69). The variables most important to the prediction differed at each site. Conclusion: Machine learning models for predicting readmissions to the PICU have limited generalizability. Locally derived models demonstrated modest performance in our study and could potentially inform provider decision-making if prospectively validated. Externally developed models are unlikely to perform well at predicting PICU readmissions.
Bender, J.; Stoks, J.; Barrios Espinosa, C.; Becker, S.; Cluitmans, M. J. M.; Loewe, A.
Show abstract
Background and Aims: Clinical interpretation of the precordial leads V1-V6 assumes that Wilson's central terminal (WCT) has a fixed anatomical location. Consequently, a positive signal corresponds to electrical activation spreading from WCT towards the respective electrode, and vice versa. However, the location of WCT has never been systematically investigated. Yet, a better understanding of WCT location could improve the interpretation of the precordial leads. This work aims to characterize the spatial expansion and location of the physical WCT i.e., the electrical potential defined by the WCT, during the P-wave on the body surface. Methods: An intensive analysis of body surface potential maps (BSPMs) during atrial depolarization in an in silico patient cohort and clinical data was conducted. Results: During the P-wave, the location of WCT was not stationary but the spatial extent and location varied across time as well as across individuals. Four distinct spatial patterns of WCT distribution on the body surface were identified in silico, and three of these were found in the clinical cohort. WCT signals agreed with BSPM signals at commonly assumed positions of WCT only for a small fraction of the P-wave. Conclusion: The spatial extension and location of WCT changes during the P-wave and thus should be considered when interpreting the precordial leads.
Briston, S. J.; Eisner, D. A.; Dibb, K. M.; Venetucci, L. A.; Trafford, A. W.
Show abstract
Drug-induced inhibition of the delayed rectifier potassium (IKr) current predisposes to early afterdepolarisations (EADs) and cardiac arrhythmias. Here, we sought to determine the contribution of action potential duration (APD), APD variability and spontaneous calcium release from the sarcoplasmic reticulum (SR) in the formation of EADs. In isolated sheep ventricular myocytes, EADs were induced by combined inhibition of IKr with dofetilide and {beta}-adrenergic stimulation. The onset of EADs was preceded by increased beat-to-beat variability of APD. To isolate the role of APD in EAD initiation, the sarcoplasmic reticulum (SR) was depleted of calcium with caffeine. The first beat post-caffeine was associated with prolonged APD but not an EAD. During {beta}-AR stimulation, increasing ryanodine receptor open probability had no effect on APD but increased APD variability and induced both EADs and delayed afterdepolarisations (DADs). Targeting RyR open probability with K201 reversibly abolished afterdepolarisations. APD variability was a better predictor of EADs than APD alone. During an EAD, changes in [Ca2+]i preceded those of membrane depolarisation and the changes in [Ca2+]i were in the form of calcium sparks. In silico modelling demonstrated that membrane time constant effects account for the delay between changes in [Ca2+]i and membrane potential. In summary, using a drug-induced model of action potential prolongation with {beta}-AR stimulation, EADs are preceded by increased APD variability and an increase in Ca2+ sparks. Targeting SR function abolishes EADs. These results suggest a key role for SR Ca2+ overload in the formation of EADs and indicate that EADs and DADs share common mechanisms. Key PointsO_LIDrugs that prolong the cardiac action potential and ECG QT interval are a major cause of early afterdepolarisations and dangerous ventricular arrhythmias initiated by early afterdepolarisations. C_LIO_LIProlongation of the action potential is widely assumed to be the primary driver of these events. C_LIO_LIWe show that early afterdepolarisations are instead preceded by increased beat-to-beat variability of action potential duration and that this variability has better sensitivity and specificity for early afterdepolarisations than action potential duration. C_LIO_LISmall, spontaneous calcium release events known as calcium sparks occur before membrane depolarisation driving early afterdepolarisations. C_LIO_LISuppressing calcium release from the sarcoplasmic reticulum abolishes early afterdepolarisations, identifying calcium handling instability as potentially a key mechanism of drug-induced arrhythmia. C_LI
Haq, K.; Berul, C.; Posnack, N.
Show abstract
Background: Traditional heart rate (HR) adjusted QT correction (QTc) formulae often fail to eliminate the inverse HR-QT interval relationship, particularly in pediatric patients. In this study, we optimized our previously published adaptive QTc (QTcAd) formula by including additional demographic variables and broadening the pediatric age range. We tested the hypothesis that QTcAd improves congenital long QT syndrome (congenital LQTS) detection performance and reduces erroneous classifications across pediatric cohorts. Methods: We retrospectively analyzed 8,306 ECGs from 4,556 cardiovascular disease (CVD)-free pediatric patients. For neonatal patients (1-30 days old), we derived daily QTcAd parameter values. For older patients, we developed regression models to estimate QTcAd parameters (mean Heart Rate (HR) = -15.9ln(days) + 219; |m| = 0.0001(days) + 1, where |m|=absolute HR-QT regression slope). To support LQTS screening, we constructed dynamic QTcAd thresholds by estimating age-specific reference limits. Diagnostic performance was tested in a clinically confirmed LQTS cohort (n=137), and further evaluated in the Pediatric Heart Network (PHN; n=2,394) and Emergency Department (ED; n=2,002) cohorts. Results: Using the confirmed LQTS cohort as the event population and the CVD-free cohort as the non-event population, QTcAd demonstrated higher sensitivity than QTcB (92% vs 46.7%). QTcAd maintained high specificity (96.9% vs 98.9%), which resulted in a higher Youden index (0.889 vs 0.456). In the PHN healthy cohort, both QTc formulae classified the majority of individuals as normal (QTcAd 95%; QTcB 98.2%) indicating few false-positives. In the ED cohort, QTcAd reduced borderline/prolonged QTc classifications requiring follow-up, yielding 270 fewer repeat-testing triggers than QTcB. We developed a publicly accessible calculator to compute QTcAd and classify congenital LQTS risk. Conclusion: We developed and validated an enhanced QTcAd formula for pediatric patients. QTcAd-based-age-adjusted dynamic thresholding improved performance for congenital LQTS screening, while maintaining high specificity. This reduces false-positive LQTS classifications and repeat ECGs, thereby decreasing unnecessary downstream clinical evaluation.
Koumantakis, E.; Remoundou, K.; Fava, C.; Roussaki, I.; Visconti, A.; Berchialla, P.
Show abstract
Intensive Care Unit (ICU) readmissions are associated with adverse clinical outcomes and increased healthcare costs. Although existing models for predicting 30-day ICU readmission show high predictive performance, they fail to account for model uncertainty, potentially resulting in overconfident and unreliable decision-making. We propose a novel Ensemble Bayesian Model Averaging (EBMA)-based framework which balances predictive discrimination with uncertainty by penalizing models that are confident but incorrect. It achieved excellent calibration (Brier score = 0.051), while maintaining discriminatory performance comparable to or exceeding that of the best individual models (AUROC > 0.716). These findings suggest that our EBMA-based framework provides a more robust and clinically reliable approach for ICU readmission prediction and decision support.
Tang, G.; Li, X.; Xiao, Y.; Wang, K.; Wu, M.; Wei, Z.; Yu, M.; Chen, X.; Hong, W.; Cheng, F.; Li, X.; Zhang, J.; Wu, X.; Hong, S.
Show abstract
Hypokalemia is a common and potentially life-threatening electrolyte abnormality in emergency care, yet rapid noninvasive screening remains difficult in time-critical triage settings. We developed PocketED-K, a single-lead AI-ECG prescreening model initialized from ECGFounder, and evaluated it in retrospective multicenter cohorts and a prospective handheld pilot. Retrospective development and validation included 37,115 patients from MC-MED and MIMIC-ED, and the pilot enrolled 18 patients at Peking University First Hospital. Hypokalemia was defined as venous serum potassium < 3.5 mmol/L. PocketED-K achieved AUROCs of 0.8189 (95% CI 0.8172--0.8207) in internal testing, 0.8104 (95% CI 0.8092--0.8115) in temporal validation, and 0.7889 (95% CI 0.7692--0.8074) in independent external validation; external negative predictive value was 0.9911 (95% CI 0.9895--0.9925). Higher predicted risk was associated with ST-segment depression, T-wave flattening or inversion, and relative U-wave prominence. The prospective handheld pilot provided an initial signal of workflow feasibility in real-world acquisition. These findings support single-lead AI-ECG as a low-burden prescreening tool to prioritize potassium testing in emergency care.
Mankowski, I.; Pinter, E.; Lee, I.-M.; Raetsch, G.; Demler, O.
Show abstract
Maximal oxygen consumption [Formula] is the gold standard for cardiorespiratory fitness but requires resource-intensive physical testing. Recent reports show that machine learning models can extract additional information from ECGs, yet the potential of ECG as a source of physiological metrics remains underutilized. While routinely collected resting electrocardiograms (ECG) provide an opportunistic window into cardiorespiratory fitness, current deep learning models often struggle with cross-cohort transferability or remain dependent on active exercise data. We developed population specific models using the UK Biobank to estimate submaximal exercise derived [Formula](N = 8,540) and a panel of other physiological metrics (sample sizes up to N = 78,265) from resting 12-lead ECGs using Patient Contrastive Learning of Representations (PCLR), an AI based tool that converts ECG into a set of 320 features (ECG-PCLR). Data were split 80%:20% (training:test) and models were evaluated on a set-aside test subset. We demonstrate that ECG-PCLR embeddings alone can estimate submaximal [Formula] and body fat percentage with Pearson correlations (r) of 0.61 and 0.65, respectively. They also estimate systolic blood pressure, forced expiratory volume in 1 second (FEV1), and grip strength with r values from 0.31 to 0.55. Adding ECG embeddings to basic predictors (age, sex and BMI) improves submaximal [Formula] prediction by an absolute {Delta}R2 of 8% and by 1% to 13% for other physiologic parameters.
Bressman, E.; Auerbach, A.; Keniston, A.; Jens, C.; Ranji, S.
Show abstract
Introduction: The use of artificial intelligence (AI) by clinicians has increased rapidly in recent years, with large language models (LLMs) emerging as tools that can equal clinician diagnostic performance in simulated settings. However, limited data exist regarding physicians use of LLMs in real-world clinical practice. This study aimed to evaluate the frequency of LLM use among practicing hospitalists, identify which LLMs are most commonly utilized, and assess hospitalists' perceptions of the benefits and limitations of LLM use in clinical care. Methods: We conducted a cross-sectional survey study of academic hospital medicine faculty across 8 institutions within the Hospital Medicine Reengineering Network (HOMERuN), a collaborative research consortium. Eligible participants included hospitalists practicing within participating HOMERuN sites during the study period. The survey assessed the frequency of LLM use, types of LLMs used, clinical applications, and physician perceptions regarding usefulness, efficiency, and concerns associated with LLM adoption. Results: 170 respondents (67.1%) reported ever using an LLM in clinical practice. Among LLM users, OpenEvidence was the most used tool (88.9%), followed by ChatGPT (58.5%), Google Gemini (26.9%), and Microsoft Copilot (20.5%). Only a minority of hospitalists reported using LLMs daily while seeing patients. The most common use cases of LLMs were answering diagnostic (77.1%) and management (77.6%) questions. A majority also reported using LLMs to identify or summarize primary literature (60.0%). Lack of trust in outputs (49.8%), uncertainty around institutional policies (48.6%), and lack of access to secure applications (43.1%) were cited as the most frequent barriers to using LLMs in practice. Discussion: The use of LLMs in clinical practice is already widespread, though regular or daily use is not yet typical. Concerns regarding reliability, patient privacy, and safe integration into clinical workflows remain significant barriers to broader adoption. The responsible implementation of LLMs in hospital medicine will require addressing these barriers.
Chugh, H.; Reinier, K.; Uy-Evanado, A.; Nakamura, K.; Sovari, A. A.; Salvucci, A.; Jui, J.; Chugh, S. S.
Show abstract
BackgroundThe incidence of sudden cardiac arrest (SCA) manifesting as pulseless electrical activity (PEA) has increased, and survival remains extremely low. Methods for early identification and management of high-risk individuals are needed, but no clinical risk scores currently exist to predict PEA-SCA. Our objective was to develop and validate a clinical prediction model for PEA-SCA. MethodsFrom an ongoing prospective, population-based study of SCA in Portland, Oregon (catchment pop. {approx}1 M, 2002-2020), we identified PEA-SCA adults. Lifetime clinical records were compared with those of a control group with >50% prevalence of significant coronary disease. Prediction models were constructed using backwards stepwise logistic regression in a training dataset (67%) and evaluated in a validation dataset (33%). Model discrimination was assessed using receiver operating characteristic curves (C statistic). External validation was performed in a geographically distinct population in Ventura County, California (population {approx}850,000, 2015-2022). ResultsThe final clinical algorithm (PEA-Risk) incorporating 12 clinical, electrocardiogram and medication variables demonstrated strong discrimination in the training dataset (C statistic = 0.860 [95% CI: 0.838-0.881]) and remained robust in internal (C statistic = 0.832 [95% CI: 0.800-0.865]) and external validation datasets (C statistic = 0.704 [95% CI: 0.665-0.743]). ConclusionsWe developed and externally validated a clinical algorithm for predicting PEA-SCA. Given the low rates of successful resuscitation after PEA arrest, this risk prediction tool may enable earlier identification and prevention of PEA-SCA. Clinical PerspectiveO_ST_ABSWhat is knownC_ST_ABSO_LIThe proportion of SCA presenting as pulseless electrical activity (PEA) is increasing, and survival from these events remains extremely low. C_LIO_LIThe are no available methods for clinical risk prediction of these events. C_LI What the study addsO_LIThe present study constructs and replicates a risk score for prediction of SCA manifesting with PEA using widely available clinical and noninvasive markers. C_LIO_LIThese findings have implications for developing prevention and management strategies for individuals at high risk of PEA-SCA. C_LI
Piorkowska, N. J.; Olejnik, A.; Ostromecki, A.; Kuliczkowski, W.; Mysiak, A.; Bil-Lula, I.
Show abstract
Background: Machine-learning models based on circulating biomarkers are increasingly used in cardiovascular research; however, model performance alone provides limited insight into how the predictive signal is distributed across features. We aimed to characterize the biomarker signal architecture of a machine-learning model distinguishing ST-elevation myocardial infarction (STEMI) from non-ST-elevation myocardial infarction (NSTEMI), with a focus on signal concentration, redundancy, and conditional complementarity. Methods: We conducted a structured secondary analysis of a previously established, leakage-controlled machine-learning framework (n = 152 patients). The BIOMARKERS feature-set variant (10 biomarkers) was evaluated using outer-fold cross-validation. Model structure was interrogated using (i) leave-one-biomarker-out analysis, (ii) pairwise leave-two-out analysis with pair-excess estimation, (iii) cumulative ablation of top-ranked biomarkers, and (iv) forward reconstruction of minimal biomarker panels. Uncertainty was assessed using bootstrap resampling across folds. Results: The full biomarker model achieved a mean ROC-AUC approaching 0.94. The predictive signal was highly non-uniform, with MMP-2 showing the largest single-feature contribution (mean {Delta}AUC {approx} 0.16). Pairwise analysis identified conditional complementarity between selected non-lipid biomarkers, particularly MMP-2 and EMMPRIN (pair {Delta}AUC {approx} 0.26; positive excess over single-feature effects), whereas lipid-related markers formed a highly correlated and largely redundant sub-cluster. Cumulative ablation demonstrated rapid performance collapse following removal of top-ranked biomarkers, consistent with structural signal concentration. Forward panel analysis showed that a compact subset of biomarkers (three features) achieved performance within ~0.01 ROC-AUC of the full model, indicating the presence of a minimal high-yield panel. Bootstrap confidence intervals suggested that small performance differences should be interpreted with caution. Conclusions: Predictive performance in this biomarker-based model arises from a structured and unevenly distributed signal architecture, characterized by a dominant core biomarker, conditionally complementary contributors, and a redundant lipid cluster. These findings highlight the importance of evaluating model structure, not only aggregate performance, and suggest that biomarker-based machine-learning systems may benefit from architecture-aware interpretation and simplification strategies.